Skip to main content
Version: 1.2.3

Chat LLM Proxy API

Overview

The Chat LLM Proxy project provides a set of APIs for interacting with various language models. This documentation outlines the available endpoints, request and response formats, and example usage to help developers integrate with the API effectively.

Table of Contents

API Endpoints

Generate Answer

POST /api/generate_answer

Overview

The generate_answer API endpoint is responsible for generating a response based on the provided input parameters, utilizing various language models. It processes the request and returns a structured response containing the generated answer along with relevant metadata.

Request Parameters

ParameterTypeDescription
requestGenerateAnswerRequestThe request object containing user input and configuration for generating the answer.
is_sessionboolIndicates whether the request is part of an ongoing session.
model_infodictA dictionary containing information about the model to be used for generating the answer.
promptstrThe prompt or question for which the answer is to be generated.
tool_defnslistA list of tool definitions that may be used in the answer generation process.
all_toolsdictA dictionary containing all available tools for the request.
history_promptlistA list of previous prompts or messages in the conversation history.
querystrThe query string that may influence the answer generation.
gen_search_textstrThe generated search text that may be included in the response.
generated_chat_idintA unique identifier for the generated chat session.
api_keystrThe API key for authentication and authorization purposes.

Response Format

The response from the generate_answer API is a JSON object containing the following fields:

FieldTypeDescription
responsestrThe generated answer based on the input prompt.
generated_search_textstrThe search text generated during the processing of the request.
finish_reasonstrIndicates the reason for the completion of the response generation (e.g., "stop", "length").

Example Usage

Request
{
"request": {
"user_cred": {
"token": "user_token",
"client": {
"tenant_id": "tenant_id"
}
},
"task_process": {
"service": "service_name"
},
"user_chat": {
"query": "What is the capital of France?",
"kvp": {}
}
},
"is_session": true,
"model_info": {
"modelId": "azure",
"modelVersion": "v1",
"temperature": 0.7,
"max_tokens": 150
},
"prompt": "What is the capital of France?",
"tool_defns": [],
"all_tools": {},
"history_prompt": [],
"query": "What is the capital of France?",
"gen_search_text": "",
"generated_chat_id": 12345,
"api_key": "your_api_key"
}
Response
{
"response": "The capital of France is Paris.",
"generated_search_text": "",
"finish_reason": "stop"
}

Get Sources

POST /api/get_sources

Overview

The get_sources function retrieves various sources of information based on the user's request and bot details. It processes the input request and returns a structured dictionary containing relevant data.

Request Parameters

The function accepts the following parameters:

  • request (GenerateAnswerRequest): An object containing user request details, including user credentials and task process information.
  • bot_details (dict): A dictionary containing details about the bot, including llm_data, system_instruction_cache_key, and num_new_uploads.
  • chat_hist (list): A list of previous chat messages that may influence the current request.

Response Format

The function returns a tuple containing:

  • sources (dict): A dictionary with the following keys:

    • gen_search_text: The generated search text based on the context.
    • model_info: Information about the model being used.
    • examples: Example responses from the bot.
    • query: The processed query string.
    • all_input_texts: A dictionary containing filtered input texts.
    • llm_data: The data related to the language model.
    • history_prompt: The prompt history for the conversation.
    • num_new_uploads: The number of new uploads associated with the request.
  • bool: A boolean indicating the success or failure of the operation.

Example Usage

from app.controller.chat_classes import GenerateAnswerRequest

# Create a request object
request = GenerateAnswerRequest(
user_cred=user_credentials,
task_process=task_process_info,
bot_config=bot_configuration,
user_chat=user_chat_info,
prev_context=previous_context
)

# Bot details
bot_details = {
"llm_data": llm_data,
"system_instruction_cache_key": "some_cache_key",
"num_new_uploads": 2
}

# Chat history
chat_hist = [
{"user_query": "What is the weather today?"},
{"user_query": "Tell me about the news."}
]

# Call the get_sources function
sources, success = get_sources(request, bot_details, chat_hist)

# Output the sources
print(sources)

Handle Continuation Request

Description

The handle_continuation_request function processes incoming queries to determine if a continuation request is being made. Specifically, it checks if the query matches a predefined constant that indicates a request to continue from the last answer.

Parameters

  • query (str): The incoming message payload to check. This is the user input that may indicate a continuation request.

Returns

  • str: The processed query. If the input query matches the constant indicating continuation, it returns the string "continue". Otherwise, it returns the original query.

Example Usage

# Example of handling a continuation request
user_query = "$continue_answer"
processed_query = handle_continuation_request(user_query)
print(processed_query) # Output: "continue"

# Example with a regular query
user_query = "What is the weather today?"
processed_query = handle_continuation_request(user_query)
print(processed_query) # Output: "What is the weather today?"

Run Concurrently

Purpose

The run_concurrently function is designed to execute two tasks concurrently using a thread pool. It allows for efficient processing of tasks that can be performed simultaneously, improving the overall performance of the application.

Request Parameters

The function accepts the following parameters:

  • llm_data (dict): A dictionary containing data related to the language model, including any necessary configurations and inputs.
  • token (str): The authentication token used to access secured resources.
  • query (str): The query string that will be processed by the language model.
  • service (str): The service identifier that specifies which language model service to use.

Return Values

The function returns a tuple containing:

  • search_text_result (str): The result of the semantic search operation.
  • extracted_text_result (str): The result of the text extraction operation.

Example Usage

llm_data = {
"context": "Sample context for processing.",
"upload_files": [],
# Additional necessary data...
}
token = "your_auth_token"
query = "What is the capital of France?"
service = "example_service"

search_text, extracted_text = run_concurrently(llm_data, token, query, service)

print("Search Text:", search_text)
print("Extracted Text:", extracted_text)

Stream Generate Answer

POST /api/stream_generate_answer

Overview

The stream_generate_answer API endpoint is designed to generate answers in a streaming manner based on the provided request parameters. This allows for real-time interaction and response generation, making it suitable for applications that require immediate feedback.

Request Parameters

ParameterTypeDescription
requestGenerateAnswerRequestThe request object containing user credentials, query, and other necessary information.
is_sessionboolIndicates whether the request is part of an ongoing session.
model_infodictA dictionary containing information about the model being used, such as model ID and version.
promptstrThe prompt to be used for generating the answer.
tool_defnslistA list of tool definitions that may be used in the answer generation process.
all_toolsdictA dictionary containing all available tools for the request.
history_promptlistA list of previous prompts to provide context for the current request.
querystrThe query string that the model will respond to.
gen_search_textstrThe generated search text that may be used in the response.
generated_chat_idintA unique identifier for the generated chat session.
api_keystrThe API key for authentication purposes.

Response Format

The response from the stream_generate_answer API is a stream of chunks, each containing the following structure:

FieldTypeDescription
generated_search_textstrThe search text generated during the answer generation process.
responsestrThe generated answer from the model.
finish_reasonstrIndicates the reason for finishing the response generation (e.g., "stop", "length").

Example Usage

Request Example
{
"request": {
"user_cred": {
"token": "your_token_here",
"client": {
"tenant_id": "your_tenant_id"
}
},
"user_chat": {
"query": "What is the capital of France?"
}
},
"is_session": true,
"model_info": {
"modelId": "azure",
"modelVersion": "v1"
},
"prompt": "Please provide the capital city of France.",
"tool_defns": [],
"all_tools": {},
"history_prompt": [],
"query": "What is the capital of France?",
"gen_search_text": "",
"generated_chat_id": 12345,
"api_key": "your_api_key_here"
}
Response Example
{
"generated_search_text": "The capital of France is Paris.",
"response": "The capital of France is Paris.",
"finish_reason": "stop"
}

Request Models Documentation

This document describes the data models used in the API requests for the chat LLM proxy. Each model outlines the structure and types of the request payloads.

GenerateAnswerRequest

Description

The GenerateAnswerRequest model is used to encapsulate the data required to generate an answer from the chat LLM.

Properties

  • user_cred (UserCred): Contains user credentials.
  • bot_config (BotConfig): Configuration settings for the bot.
  • query (string): The input query for which an answer is to be generated.
  • task_process (TaskProcess): Information about the task being processed.
  • prev_context (string, optional): Previous context for continuation requests.
  • kvp (dict): Key-value pairs for additional parameters.

Example

{
"user_cred": {
"token": "user_token",
"client": {
"tenant_id": "tenant_id"
}
},
"bot_config": {
"caller_version": "v6",
"tool_config": null
},
"query": "What is the capital of France?",
"task_process": {
"service": "chat_service"
},
"prev_context": null,
"kvp": {
"files": [
{
"file_name": "document1.txt",
"source_category": "InputFile"
}
]
}
}

Other Models

UserCred

  • token (string): The authentication token for the user.
  • client (ClientInfo): Information about the client.

BotConfig

  • caller_version (string): The version of the bot being called.
  • tool_config (ToolConfig, optional): Configuration for tools used by the bot.

TaskProcess

  • service (string): The service being used for the task.

ClientInfo

  • tenant_id (string): The tenant ID associated with the client.

ToolConfig

  • (Define properties as needed based on your application requirements)